AITopics | online mirror descent

Online-Within-Online Meta-Learning

Neural Information Processing SystemsDec-26-2025, 02:01:06 GMT

We study the problem of learning a series of tasks in a fully online Meta-Learning setting. The goal is to exploit similarities among the tasks to incrementally adapt an inner online algorithm in order to incur a low averaged cumulative error over the tasks. We focus on a family of inner algorithms based on a parametrized variant of online Mirror Descent. The inner algorithm is incrementally adapted by an online Mirror Descent meta-algorithm using the corresponding within-task minimum regularized empirical risk as the meta-loss. In order to keep the process fully online, we approximate the meta-subgradients by the online inner algorithm. An upper bound on the approximation error allows us to derive a cumulative error bound for the proposed method. Our analysis can also be converted to the statistical setting by online-to-batch arguments. We instantiate two examples of the framework in which the meta-parameter is either a common bias vector or feature map. Finally, preliminary numerical experiments confirm our theoretical findings.

inner algorithm, name change, online-within-online meta-learning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Efficient Phi-Regret Minimization in Extensive-Form Games via Online Mirror Descent

Neural Information Processing SystemsDec-24-2025, 18:13:15 GMT

A conceptually appealing approach for learning Extensive-Form Games (EFGs) is to convert them to Normal-Form Games (NFGs). This approach enables us to directly translate state-of-the-art techniques and analyses in NFGs to learning EFGs, but typically suffers from computational intractability due to the exponential blow-up of the game size introduced by the conversion. In this paper, we address this problem in natural and important setups for the \emph{$\Phi$-Hedge} algorithm---A generic algorithm capable of learning a large class of equilibria for NFGs. We show that $\Phi$-Hedge can be directly used to learn Nash Equilibria (zero-sum settings), Normal-Form Coarse Correlated Equilibria (NFCCE), and Extensive-Form Correlated Equilibria (EFCE) in EFGs. We prove that, in those settings, the \emph{$\Phi$-Hedge} algorithms are equivalent to standard Online Mirror Descent (OMD) algorithms for EFGs with suitable dilated regularizers, and run in polynomial time. This new connection further allows us to design and analyze a new class of OMD algorithms based on modifying its log-partition function. In particular, we design an improved algorithm with balancing techniques that achieves a sharp $\widetilde{\mathcal{O}}(\sqrt{XAT})$ EFCE-regret under bandit-feedback in an EFG with $X$ information sets, $A$ actions, and $T$ episodes. To our best knowledge, this is the first such rate and matches the information-theoretic lower bound.

efficient phi-regret minimization, extensive-form game, online mirror descent, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

Population-aware Online Mirror Descent for Mean-Field Games with Common Noise by Deep Reinforcement Learning

Wu, Zida, Lauriere, Mathieu, Geist, Matthieu, Pietquin, Olivier, Mehta, Ankur

arXiv.org Artificial IntelligenceSep-4-2025

Mean Field Games (MFGs) offer a powerful framework for studying large-scale multi-agent systems. Yet, learning Nash equilibria in MFGs remains a challenging problem, particularly when the initial distribution is unknown or when the population is subject to common noise. In this paper, we introduce an efficient deep reinforcement learning (DRL) algorithm designed to achieve population-dependent Nash equilibria without relying on averaging or historical sampling, inspired by Munchausen RL and Online Mirror Descent. The resulting policy is adaptable to various initial distributions and sources of common noise. Through numerical experiments on seven canonical examples, we demonstrate that our algorithm exhibits superior convergence properties compared to state-of-the-art algorithms, particularly a DRL version of Fictitious Play for population-dependent policies. The performance in the presence of common noise underscores the robustness and adaptability of our approach.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2509.0303

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback

Temporal Variability in Implicit Online Learning

Neural Information Processing SystemsAug-15-2025, 03:15:04 GMT

In the setting of online learning, Implicit algorithms turn out to be highly successful from a practical standpoint. However, the tightest regret analyses only show marginal improvements over Online Mirror Descent. In this work, we shed light on this behavior carrying out a careful regret analysis. We prove a novel static regret bound that depends on the temporal variability of the sequence of loss functions, a quantity which is often encountered when considering dynamic competitors. We show, for example, that the regret can be constant if the temporal variability is constant and the learning rate is tuned appropriately, without the need of smooth losses. Moreover, we present an adaptive algorithm that achieves this regret bound without prior knowledge of the temporal variability and prove a matching lower bound.

algorithm, loss function, optimization, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > Canada (0.04)
Europe > France (0.04)

Industry: Education > Educational Setting > Online (0.85)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.61)

Add feedback

Temporal Variability in Implicit Online Learning

Neural Information Processing SystemsAug-15-2025, 03:14:56 GMT

In the setting of online learning, Implicit algorithms turn out to be highly successful from a practical standpoint. However, the tightest regret analyses only show marginal improvements over Online Mirror Descent. In this work, we shed light on this behavior carrying out a careful regret analysis. We prove a novel static regret bound that depends on the temporal variability of the sequence of loss functions, a quantity which is often encountered when considering dynamic competitors. We show, for example, that the regret can be constant if the temporal variability is constant and the learning rate is tuned appropriately, without the need of smooth losses. Moreover, we present an adaptive algorithm that achieves this regret bound without prior knowledge of the temporal variability and prove a matching lower bound.

algorithm, loss function, optimization, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > Canada (0.04)
Europe > France (0.04)

Industry: Education > Educational Setting > Online (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.62)

Add feedback

Efficient Phi-Regret Minimization in Extensive-Form Games via Online Mirror Descent

Neural Information Processing SystemsJan-17-2025, 15:44:17 GMT

A conceptually appealing approach for learning Extensive-Form Games (EFGs) is to convert them to Normal-Form Games (NFGs). This approach enables us to directly translate state-of-the-art techniques and analyses in NFGs to learning EFGs, but typically suffers from computational intractability due to the exponential blow-up of the game size introduced by the conversion. In this paper, we address this problem in natural and important setups for the \emph{ \Phi -Hedge} algorithm---A generic algorithm capable of learning a large class of equilibria for NFGs. We show that \Phi -Hedge can be directly used to learn Nash Equilibria (zero-sum settings), Normal-Form Coarse Correlated Equilibria (NFCCE), and Extensive-Form Correlated Equilibria (EFCE) in EFGs. We prove that, in those settings, the \emph{ \Phi -Hedge} algorithms are equivalent to standard Online Mirror Descent (OMD) algorithms for EFGs with suitable dilated regularizers, and run in polynomial time.

algorithm, extensive-form game, online mirror descent, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

Online Mirror Descent for Tchebycheff Scalarization in Multi-Objective Optimization

Liu, Meitong, Zhang, Xiaoyuan, Xie, Chulin, Donahue, Kate, Zhao, Han

arXiv.org Artificial IntelligenceNov-11-2024

The goal of multi-objective optimization (MOO) is to learn under multiple, potentially conflicting, objectives. One widely used technique to tackle MOO is through linear scalarization, where one fixed preference vector is used to combine the objectives into a single scalar value for optimization. However, recent work (Hu et al., 2024) has shown linear scalarization often fails to capture the non-convex regions of the Pareto Front, failing to recover the complete set of Pareto optimal solutions. In light of the above limitations, this paper focuses on Tchebycheff scalarization that optimizes for the worst-case objective. In particular, we propose an online mirror descent algorithm for Tchebycheff scalarization, which we call OMD-TCH. We show that OMD-TCH enjoys a convergence rate of $O(\sqrt{\log m/T})$ where $m$ is the number of objectives and $T$ is the number of iteration rounds. We also propose a novel adaptive online-to-batch conversion scheme that significantly improves the practical performance of OMD-TCH while maintaining the same convergence guarantees. We demonstrate the effectiveness of OMD-TCH and the adaptive conversion scheme on both synthetic problems and federated learning tasks under fairness constraints, showing state-of-the-art performance.

fedavg, omdgd-tch, tchebycheff scalarization, (11 more...)

arXiv.org Artificial Intelligence

2410.21764

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Hong Kong (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Online-Within-Online Meta-Learning

Neural Information Processing SystemsOct-11-2024, 02:57:44 GMT

We study the problem of learning a series of tasks in a fully online Meta-Learning setting. The goal is to exploit similarities among the tasks to incrementally adapt an inner online algorithm in order to incur a low averaged cumulative error over the tasks. We focus on a family of inner algorithms based on a parametrized variant of online Mirror Descent. The inner algorithm is incrementally adapted by an online Mirror Descent meta-algorithm using the corresponding within-task minimum regularized empirical risk as the meta-loss. In order to keep the process fully online, we approximate the meta-subgradients by the online inner algorithm.

inner algorithm, online mirror descent, online-within-online meta-learning, (1 more...)

Neural Information Processing Systems

Genre: Instructional Material > Online (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

On the Universality of Online Mirror Descent

Neural Information Processing SystemsMar-15-2024, 07:59:39 GMT

We show that for a general class of convex online learning problems, Mirror Descent can always achieve a (nearly) optimal regret guarantee.

algorithm, banach space, mirror descent, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Industry: Education (0.58)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Population-aware Online Mirror Descent for Mean-Field Games by Deep Reinforcement Learning

Wu, Zida, Lauriere, Mathieu, Chua, Samuel Jia Cong, Geist, Matthieu, Pietquin, Olivier, Mehta, Ankur

arXiv.org Artificial IntelligenceMar-6-2024

Mean Field Games (MFGs) have the ability to handle large-scale multi-agent systems, but learning Nash equilibria in MFGs remains a challenging task. In this paper, we propose a deep reinforcement learning (DRL) algorithm that achieves population-dependent Nash equilibrium without the need for averaging or sampling from history, inspired by Munchausen RL and Online Mirror Descent. Through the design of an additional inner-loop replay buffer, the agents can effectively learn to achieve Nash equilibrium from any distribution, mitigating catastrophic forgetting. The resulting policy can be applied to various initial distributions. Numerical experiments on four canonical examples demonstrate our algorithm has better convergence properties than SOTA algorithms, in particular a DRL version of Fictitious Play for population-dependent policies.

algorithm, initial distribution, iteration, (14 more...)

arXiv.org Artificial Intelligence

2403.03552

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback

Filters

Collaborating Authors

online mirror descent

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Online-Within-Online Meta-Learning

Efficient Phi-Regret Minimization in Extensive-Form Games via Online Mirror Descent

Population-aware Online Mirror Descent for Mean-Field Games with Common Noise by Deep Reinforcement Learning

Temporal Variability in Implicit Online Learning

Temporal Variability in Implicit Online Learning

Efficient Phi-Regret Minimization in Extensive-Form Games via Online Mirror Descent

Online Mirror Descent for Tchebycheff Scalarization in Multi-Objective Optimization

Online-Within-Online Meta-Learning

On the Universality of Online Mirror Descent

Population-aware Online Mirror Descent for Mean-Field Games by Deep Reinforcement Learning